Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector
نویسندگان
چکیده
In this paper, we investigate the use of Multiple Background Models (M-BMs) in Speaker Verification (SV). We cluster the speakers using either their Vocal Tract Lengths (VTLs) or by using their speaker specific Maximum Likelihood Linear Regression (MLLR) super-vector, and build a separate Background Model (BM) for each such cluster. We show that the use of M-BMs provide improved performance when compared to the use of a single/gender wise Universal Background Model (UBM). While the computational complexity during test remains same for both M-BMs and UBM, M-BMs require switching of models depending on the claimant and also score-normalization becomes difficult. To overcome these problems, we propose a novel method which aggregates the information from Multiple Background Models into a single gender independent UBM and is inspired by conventional Feature Mapping (FM) technique. We show that using this approach, we get improvement over the conventional UBM method, and yet this approach also permits easy use of score-normalization techniques. The proposed method provides relative improvement in Equal-Error Rate (EER) by 13.65% in the case of VTL clustering, and 15.43% in the case of MLLR super-vector when compared to the conventional single UBM system. When AT-norm score-normalization is used then the proposed method provided a relative improvement in EER of 20.96% for VTL clustering and 22.48% for MLLR super-vector based clustering. Furthermore, the proposed method is compared with the gender dependent speaker verification system using Gaussian Mixture Model-Support Vector Machines (GMM-SVM) super-vector linear kernel. The experimental results show that the proposed method perform better than gender dependent speaker verification system.
منابع مشابه
Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification
It is common to use a single speaker independent large Gaussian Mixture Model based Universal Background Model (GMMUBM) as the alternative hypothesis for speaker verification tasks. The speaker models are themselves derived from the UBM using Maximum a Posteriori (MAP) adaptation technique. During verification, log likelihood ratio is calculated between the target model and the GMM-UBM to accep...
متن کاملAnchor and UBM-based multi-class MLLR m-vector system for speaker verification
In this paper, we propose two techniques to extend the recently introduced global Maximum Likelihood Linear Regression (MLLR) transformation (i.e. super-vector) based m-vector system for speaker verification into a multi-class MLLR mvector system in the Universal Background Model (UBM) framework. In the first method, Gaussian mean vectors of the UBM are first grouped into several classes using ...
متن کاملSub-vector Extraction and Cascade Post-Processing for Speaker Verification Using MLLR Super-vectors
In this paper, we propose a speaker-verification system based on maximum likelihood linear regression (MLLR) super-vectors, for which speakers are characterized by m-vectors. These vectors are obtained by a uniform segmentation of the speaker MLLR super-vector using an overlapped sliding window. We consider three approaches for MLLR transformation, based on the conventional 1-best automatic tra...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Speech Technology
دوره 15 شماره
صفحات -
تاریخ انتشار 2012